The Make2D-DB II Package

current version: 3.10.2 -- December 2011

The Make2D-DB II Site on ExPASy - For an up-to-date documentation, news and FAQ

Read-Me: Main

Introduction
Installation

before using the tool
the database and the Web server installation process
querying the new database
cross-references
package content
database schema
to do
contact

Introduction

The Make2D-DB II is an environment to create, convert, interconnect and keep up-to-date 2-DE databases. Being open source (cf. license), it is distributed totally free of charge.

The tool converts federated 2-DE databases from text format (flat file format), as well as spreadsheets (e.g. excel reports) or simple text reports, into a relational format. It also creates new relational databases from scratch. It runs on any UNIX-type operating system (Linux, Solaris/SunOS, IRIX). Being continuously developed, the tool is evolving in concert with the current Proteomics Standards Initiative.

Due to its concept, The Make2D-DB II ensures a very high consistency of data. It allows dynamic interconnection between any number of remote databases/projects and offers many other features. Those features include automatic data updates related to external data resources, dynamic cross-references to similar databases, an intuitive search engine with data visualization and exports in various formats.

With this tool, one can easily convert any existing personal 2-DE data into a more reliable format.

In addition to those features, the tool extends the concept of 2-DE databases to cover a more elaborate description of proteomic experiments and analysis. Currently, it can integrate various annotation documents (e.g. sample and gel preparation), as well as many common analysis file formats (e.g. Mass Spectrometry, mzData, etc..). References to other remote repositories can also be easily integrated (e.g. PRIDE for MS/MS related data). Users can also define their own types of annotations and easily integrate them within their data representation.

Even with no local data, any laboratory can easily build up an intuitive Web portal accessing as many remote 2-DE resources as desired.

The Web interface appearance can be personalized for everyone's taste. Data can be marked to be public, as well as fully or partially private. An administration Web interface, highly secured, makes external data integration, data export, data privacy control, database publication and versions' control a very easy task to perform.

Features:

The tool can easily run without any specific technical knowledge.

In its current version (2.50), it checks deeply the syntax and consistency of the data to be converted (if available), points out any encountered ambiguity or error and suggests corrections in many cases. If the maps are annotated with Melanie / ImageMaster^TM 2D Platinum software, exports from this software in XML or text reports are directly read, otherwise simple text files (referenced in this document as "reports"), spreadsheets (tab-delimited exports from excel, CSV) or SWISS-2DPAGE-like flat files are adequate.

Once the maps and spots' annotations are captured, the data is converted into a consistent relational database schema. This schema is composed of four distinct parts. One to hold the 'core' data (including the not-yet-public data, as well as all operations concerning data management), a second for publicly accessible data, a third that keeps track of all modifications applied to the database throughout its history, and finally a fourth part to hold common procedures needed by both the 'core' and the public schema.

Once installed, queries can then be addressed using the provided Web interface. An interface that is easily configurable and adaptable for personal tests and needs. A graphical viewer is also included to display maps, spot annotations and mass spectrometry peak lists. The provided interface has the ability to query simultaneously several local or remote databases all at once and is also able to exchange data with other similar interfaces (for interoperability). This particularity means that queries may be addressed to several data resources simultaneously. Results are then consolidated before being displayed to the user.

The interface offers an additional feature, which is the possibility to extract (in real time) many types of objects or search results in different formats, so they may be imported into other programs, or simply be viewed and integrated as is in other displayers or into distinct Web pages. In addition, there is also an optional URI-like syntax meant to extract objects or launch search queries in a rather intuitive manner. Several attributes can then be given to define how to serve data (e.g. the output format) and what it is needed for (e.g. extract results for a textual export or just to be graphically viewed).

A main feature of the tool is that it can gather, integrate and update data relative to external information from the Web in a complete transparent manner. Dynamic cross-references to similar data resources are also automatically provided. This data can continuously and easily be managed and kept up-to-date.

A more elaborated update interface shall be provided in the future to cover all the management details of local data. If you register and install the tool then you will be notified for all major changes and updates. The forthcoming versions of the tool should always allow you to easily convert from a previous installation to an up-to-date one in an almost transparent manner.

Some comments:

You should be aware that the pre-defined queries of this version do not take full advantage of all the strength given by the new data model. Any suggestion from users for pre-defined queries are more than welcome.

Due to the fact that we aim to convert already existing data into a newly designed relational format, some flexibility has been intentionally adopted to suit the commonly used federated SWISS-2DPAGE syntax. The goal being to make the conversion as smooth as possible and to prevent users from re-writing their data whenever some more precision would have been wished. Subsequent versions of the tool should gradually work with, and present data, based in a 'map/spot-like' view rather than 'entries-like' one.

The graphic interface has been tested with a range of different configurations, with different browsers (Explorer, Mozilla, Netscape, FireFox,..) on different operating systems (Linux, Windows, MAC,..). In the meanwhile, we would highly appreciate your remarks concerning any inconvenient behavior you may encounter.

Some facts about the Web query interface:

The Web query interface is intended to query any number of databases built with this tool. It is not limited to query only one database, as it can hold simultaneously as many databases as one may wish. It is not required that those databases are to be installed on the same server. The interface can address remote databases as long as their Web addresses or connection parameters are known and accessible. This means that you can still use this tool to host a portal interface that queries several remote databases (e.g. WORLD-2DPAGE like portal), even without building your own. To use the tool for this purpose, you will only need to generate a server configuration file, as described later, and to launch the tool with the appropriate server installation options.

[top]

Installation

1.1 BEFORE USING THE TOOL, YOU SHOULD:

* many of the following tasks are usually to be performed by the system administrator

Install Perl version 5.6 or higher (Perl/ActivePerl 5.8 or 5.10 recommended), make sure you also include the perl DBI module and the PostgreSQL driver DBD::Pg (check for DBD::Pg that it has been compiled with the POSTGRES_LIB variable pointing to the correct version of your postgreSQL libraries).

- Some few old versions of the perl DBI module were not working properly with the Web interface. If you encounter any problem at this level, upgrade to a recent version of DBI. We should also mention a detected bug found in version 1.13 of the XML-XPath perl module.

- Included with the package is an adapted version of the Phenyx module InSilicoSpectro, which is used to read mass spectrometry files from a range of various formats. A special version has been specifically arranged to work with this package with a minimum set of installation requirements! In some cases, you may need to install the Compress-Zlib module from CPAN if you plan to read and include data from your MS files and you receive a error complaining about Zlib missing. A recent copy of this module is also to be found within this package distribution ( libs/Compress-Zlib.tar.gz ).

- In order to use the MelanieXMLParser module you should make sure you have the XML::Parser::PerlSAX (libxml-perl) perl module installed. Make also sure the XML::Twig (XML-Twig) perl module is also present if you want to use the Phenyx MS converter.

- Optional: in order to integrate the GDchart MS spectra viewer (GDchart.cgi), you should make sure the perl module "Chart " (http://www.cpan.org/modules/by-module/Chart/) is already installed in your system. If it is not, then you will need to install some or all of the following components (in the given order):

 - GD graphics library: http://www.boutell.com/gd/
 - zlib compression library: http://www.zlib.net/
 - libpng PNG reference library: http://www.libpng.org/pub/png/libpng.html
 - perl module GD: http://search.cpan.org/~mverb/GDGraph-1.43/Graph.pm)
 - perl module Chart: http://www.cpan.org/modules/by-module/Chart/, (see: http://search.cpan.org/src/CHARTGRP/Chart-2.4.1/Documentation.pdf)

Install and configure a HTTP server (Apache is highly recommended, and required if you plan to use the mod_rewrite redirection rules)

Install and run PostgreSQL server (except if you are only installing a Web portal for remote databases with no local data)

PostgreSQL is an open source object-relational database management system that can be found at: http://www.postgresql.org

Download and install postgreSQL version 9.1 or higher for your system (all distributions from version 7.3 to 9.1 have been fully tested).

To use postgreSQL, the holder of the postgres server daemon (who is typically the database superuser, generally called 'postgres' or 'pgsql') - see with your system admininstrator - should have launched and checked the following commands (or similar ones, assuming the bin files are in the /usr/local/pgsql/bin directory and the lib files in the /usr/local/pgsql/lib):

- Verify that the pgsql binaries are added to your PATH variable (e.g. by setting: PATH=$PATH:/usr/local/pgsql/bin)

- Export the environment variable "PGDATA" pointing to the directory where the database is to be located, as well as the environment variable "PGLIB" pointing to where the postgreSQL libraries are to be found.

    e.g. (depending on your system)
     $ export PGDATA=/a_path_to_store_your_database/postgresql (this path will be referred to as the [new_database_path])
     $ export PGLIB=/usr/local/pgsql/lib

- Initialize the database cluster (only once)

/usr/local/pgsql/bin/initdb -D [new_database_path] (-D "new_database_path" can be omitted if "PGDATA" is sweetly defined)

As you will need to allow non-local connections, you will probably have to add more "host" records in the generated configuration file pg_hba.conf created in the directory [new_database_path]. You should at least allow the "localhost" connections - used by the tool's Web interface - by making sure the line:

"host all all 127.0.0.1 255.255.255.255 [method]"
(often set as "host all all 127.0.0.1 255.255.255.255 trust" or "host all all 127.0.0.1/32 trust")

is *not commented* (by removing the '#' character at its beginning). You may keep or modify the trust parameter.
Methods can be "trust", "reject", "md5", "crypt", "password", "krb5", "ident", or "pam". Note that "password" sends passwords in clear text; "md5" is preferred since it sends encrypted passwords.
Caution! you may replace the default method "trust" to ensure a higher security by requiring a password when connecting to the database using the administration interface, e.g.
"host all all 127.0.0.1 255.255.255.255 md5"
or
"host all all 127.0.0.1/32 md5"

Also, remember TCP/IP connections are only enabled if you do not disable "tcpip_socket" in the file postgresql.conf (postgreSQL 7.4 or lower). For postgreSQL 8.0 or higher, connections are enabled with "listen_addresses = 'localhost'" and "port = 5432" or similar).

- Start the postmaster process (the database server) by:

nohup /usr/lib/postgresql/_postgres_version_/bin/postgres -i -d 1 -o -e -D [new_database_path] > [new_database_path]/server.log 2>&1 &

or (for older versions)

nohup /usr/local/pgsql/bin/postmaster -i -d 1 -o -e -D [new_database_path] > [new_database_path]/server.log 2>&1 &

The "-i" option is necessary to use postgreSQL with networking.
The use of the switch "-d [1-5]" is to include debug information in the server.log file; debug information is redirected with " > path/server.log 2>&1". Both this switch and the redirection can be omitted if not desired.

- Include the PL/pgSQL procedural language to the created database by. To do only once [Not necessary with recent postgreSQL versions]:

/usr/local/pgsql/bin/createlang plpgsql template1 --pglib "/usr/local/pgsql/lib/" (--pglib "path" can be omitted if "PGLIB" is defined)

Tip:
Another simple way to start / restart / stop the postmaster server is to use pg_ctl:
nohup pg_ctl -o "-i" -D [new_database_path] [start|reload|restart|stop] &
(e.g. nohup pg_ctl -o "-i" -D /path/my_database -m smart restart &)

If postgreSQL binaries and libs are not in /usr/local/pgsql/ (e.g. in /usr/lib/postgresql/9.1/bin/), then modify the previous paths.

Important:

The database holder (the superuser) will now own the postmaster process and the new databases to be created. Although this is possible, you will not want this general postgres superuser to directly edit your specific new 2-DE database. For that, let the superuser (who can physically be you or your system administrator) create another new user (named after the postgreSQL account you plan to use with the 2-DE database, possibly your own personal account). Make sure this new user do have the right to create both databases and other new users.

To create a new user that will own the 2-DE database, the superuser should execute the following SQL command:

CREATE USER newUser_loginID CREATEDB CREATEUSER

If a password is wished, then state it by:

CREATE USER newUser_loginID WITH PASSWORD 'newUser_loginPassword' CREATEDB CREATEUSER

The Make2D-DB II tool will automatically create an additional new postgreSQL user called (by default) 'select2d' who has no special rights except performing select queries on the 2-DE database public and common schema.
In case, as a postgreSQL user, you were not allowed the right to create another user, then ask the postgres/postmaster process owner (the superuser) to create a user called 'select2d' ( by typing the system command: "createuser select2d", or simply by executing the SQL command: "CREATE USER select2d" within a psql session). For more details, you can check the following link: PostgreSQL Technical Documentation (9.1).

[top]

1.2 THE DATABASE CONVERSION / INSTALLATION PROCESS AND THE WEB SERVER SETTING UP:

Prepare your data files by following the instructions given in the Readme: Preparation file

Go to the directory containing the Make2D-DB II scripts and generate/configure the two configuration files:

include.cfg

2d_include.pl

as well as the maps' file

- existing.maps (annotations of your maps)

by typing the command:

perl make2db.pl -m config

or, to specify an alternative location for your configuration files to be written to (which is very useful if you are planing to install in the future different databases):

perl make2db.pl -m config -c /some_path_of_your_configuration_files

This operation is fully described in the Readme: Configuration file.

There is another third file named basic_include.pl located in the config directory. This configuration file contains some extra definitions that you occasionally may want to change. If some of those definitions do not suite you, then fell free to change them by editing this file using any simple text editor (though, do not change any definition coming after the section labeled 'DO NOT EDIT OR CHANGE ANYTHING STARTING FROM HERE!!'). This script is a perl script that does not tolerate any syntax error. In case the tool complains about any modifications, there is a master version of this file, called basicIncludeInitial.pl, that is located in the lib2d directory. Simply restore the file from it.
If you are using the '-c /some_path_of_your_configuration_files' switch to write your configuration files in some loaction oher than the default config directory, then you may also wish to have a copy of this basic_include.pl file to be associated with this specific configuration. Simply copy the basic_include.pl file into the directory given by the '-c' switch and edit the file from there.

You are now ready to launch the database conversion / installtion process

Check you are in the Make2D-DB II directory, then type the command:

perl make2db.pl -m [option]

or - to specify an alternative location for your configuration files to be written or to be read:
perl make2db.pl -m [option] -c /some_path_of_your_configuration_files

option

nfig

check
-> Check the syntax and the consistency of your data to be converted, no further operation is launched. The process is interrupted when a major error is encountered.

check-report
-> Same as 'check', except it does not stop on major errors (check the last_STDOUT.log file from the main directory for a summary check report).

create
-> Create the relational schema for a new database from scratch (a void database), no data conversion is done.

transform
-> Combine the 'check' and 'create' options (as well as the 'server' option, see later). Checking database content, creating the relational schema and loading data into the new database. This option also prepare and install the files for the Web server.

update
-> For both updates of the schema structure and the database content (implemented only since version 2.50.1). To be used to add extra data or to update an older installation.

server
-> This option can be used independently if you wish to host an interface to query other remote databases without even having your own database. It is also to be used when the 'transform' option cannot fully set up the HTTP Web server pages due to invalid permissions. The script can then be *re-executed* with this option to only set up the HTTP Web server files (probably as the http Web server user, or with directly a 'sudo' command).

e.g. type 'perl make2db.pl -m transform' to convert your existing database into the new relational format and to set up the Web server.

For more details, type: perldoc make2db.pl or perl make2db.pl -help.

All output messages and error messages of your last check, transform or update operations are stored in parallel in 2 separate files, so you may check/inspect them at any moment [last_STDOUT.log and last_STDERR.log, to be found in the main directory]. For any encountered installation problem, please do always include those two files with your message when contacting us. The file last_STDOUT.log describes in details the check and installation process, but it also contains important messages and warnings relevant to your data conversion. It's always a good idea to have a look at this file after any installation - even if every thing seems to have been executed correctly - and look for any potential errors or warning messages, for example by searching in the file the two keywords: 'error' and 'warning'!

To check your data, without launching physically any database installation, use the '-m check' option. If the tool encounters any fatal error, it simply halts the process pointing you to the source of error. When too many errors seem to be present, you should better use the '-m check-report' to check all your data without interruption. You should then open and inspect the generated 'last_STDOUT.log' file and correct all the described sources of error within your data until no errors are left.

The option to convert a database is '-m transform'

This combines the '-m check, create and -m server' all together. Type:

perl make2db.pl -m transform
or
perl make2db.pl -m transform -c /some_path_of_your_configuration_files

The process starts by extracting the maps and spots annotations from the maps / list files. If not provided, an intermediate flat file (similar to SWISS-2DPAGE text file) is automatically generated for internal use. This file can also be edited manually by yourself to modify or add any data (following the SWISS-2DPAGE syntax). The file is then deeply checked, ambiguities are pointed out and errors are described. You will have to correct major errors yourself, as the syntax checker will stop the process and display the source of error (sometimes with a correction proposal). Otherwise, for some minor errors/ambiguities, you will be asked to make a choice between accepting some behavior or simply stop the process and correct them manually.

Once the syntax checking part has succeeded, the tool connects to the Internet to gather some additional external data related to your entries (if you have not disabled this function in your configuration files). It is highly recommended that you provide accession numbers similar to those of UniProtKB (cf. UniProtKB Accession numbers) or at least provide cross-references to Swiss-Prot/TrEMBL (UniProtKB) to get the maximum use of this external data gathering feature.

After that step, the database schema is constructed (schema and tables created, functions, rules and triggers loaded). Please, note that you should not run any parallel postgreSQL/psql session on your database (if it already exists) during this phase.

The data to be converted is then loaded into the relational tables. Follows the construction of the 'view' functions and tables (materialized tables dedicated to prepare static views in order to accelerate queries execution). This operation could be relatively long, depending on the size of your data. The database is completely converted once some additional technical information has been added to it.

The option to update a database (for both structure and content update) is '-m update'.

This combines the previous '-m transform' option to a procedure that reads your old data and makes re-use of it. Type:

perl make2db.pl -m update
or
perl make2db.pl -m update -c /some_path_of_your_configuration_files

Use this option if you want to upgrade your already running database to a newer version of the tool.
Use also this option if you want to modify or add some data to your already installed database.

If you are upgrading from a previous version of the tool, it is highly recommended to generate new configuration files (using the "-m config" option) before updating rather than using your older ones. Some configuration parameters are added or modified between versions.

In both cases (if you're upgrading the tool, or adding some new data), you will definitely need to provide the tool with all your data that you have already used during your previous installtion in order to keep it within the new installtion! You may of course modify some of your data, delete or add some other before running the tool. In fact, the "-m update" option does not directly add any data to the already existing database, instead, it erases it completely from the system before re-installing it again based on the availability of the provided data. The purpose of using this switch is that the tool keeps a record of all modified entries versions (general and 2D annotations incremented using the administration interface), as well as all meta-data related to the database publication (current database release, release history, description, comments, etc..).
In all cases, all your data is backuped by the tool before being erased. Whenever a technical poblem is encountered during the update process, your old database is automatically restored back and your previous data should be left untouched.

The next versions of this tool should progressively include an interactive update interface to modify, add or delete data directly within a running database. Given the relative complexity of the schema and the relations used in this tool, this work should initially offer only some very basic operations (erase or add a new map, a spots' list, identified proteins,..). In a later development phase, more complexe data (identification categories, experimental data, bibliographic references, cross-linking, etc..) should be also managed. A prototype of such an interface can, for the moment, be only acquired by contacting us directly.

The Web Server settings

The final step will be the preparation of the Web server interface.

If you are installing a Web portal without any (new) local database, then you will only need to run the tool using this server option (no need for any previous '-m [check|create|transform]' operation)

In the more common situations - when you are installing a new local database - if you do have full write permissions into the HTTP server directories, then things will simply go smoothly and you will not have to care about the following notes. Otherwise, you will be notified that you do not have the required permissions for some of the operations to be achieved. To achieve those operations, you should wait until the script has terminated and then re-execute the script again with a user ID that has write permissions into the HTTP server directories (apache, http,nobody user, root user, etc.. / see with your system administrator or use a 'su or a sudo' command)*. This time, you should use the option '-m server' with the script:

perl make2db.pl -m server
or
perl make2db.pl -m server -c /some_path_of_your_configuration/files

This operation should take very few seconds, as it skips all the previous operations already completed, except those concerning the server settings.

(Another option would be to create the needed directories manually and give them the right permissions. As this may be a quite complicated task, you may contact us for the details)

Note: If, for some reason, some default directories (icons, tmp, html) are not recognized by your Web server, you should declare them in the server configuration file. For Apache, edit the httpd.conf file (usually at '/etc/httpd/conf/httpd.conf') by adding blocks of the following format (see with your system administrator):

Alias /icons/ "/www/var/new_directory/"

<Directory "/www/var/new_directory">
     Options Indexes MultiViews
     AllowOverride None
     Order allow,deny
     Allow from all
</Directory>

the path '/www/var/new_directory' could be different on your system.
Once done, you should then restart the HTTP server for the changes to take effect.

End of Installation:

You will now find a copy of the server configuration file "2d_include.pl" in the new created server directory '/var/www/cgi-bin/2d/inc/' by default*. By editing this file at any moment, you will be able to change the behavior of your Web interfaces, e.g. colors, logo image, displayed names of the databases, copyright message, but also more technical parameters for the server and the various databases to connect to.
Another important configuration file, the already mentionned "basic_include.pl", is to be found in the same directory '/var/www/cgi-bin/2d/inc/'. This file will control very specific behavior of your interface, including how to deal with external data update and to define the potential other remote interfaces to link to with your own . It also controls many other data definitions, e.g. controlled vocabulary for the identification methods, HTTP server passwords, etc.. In general, you will seldom need to edit this file, except to add or modify the list of remote interfaces if needed.

*("/var/www/" is, in many systems, the default path to the Apache HTTP server directories. In some installations it may be located somewhere else)

[top]

1.3 QUERYING THE NEWLY CONVERTED DATABASE:

After installing your database, you should now be able to query it. In what follows, we suppose that you have configured your '$web_server_ref_name' variable to '2d' (cf. Readme: Configuration for more details). You will then need to type the following address in your browser (FireFox, Mozilla, IE, et..) to navigate through the query interface:

- http://your_domain_name/2d/ ( http://your_domain_name/html/2d/ in some systems )

(If you have set the server scripts and documents manually, then you should rather type: http://your_domain_name/cgi-bin/2d/2d.cgi)

Where your_domain_name is the name or IP address of your domain.

To access the management interface as the database administrator, type:

- http://your_domain_name/2d/admin

For the administration part, you will be asked to enter your postgres login_id and password (the postgres account used to create the database / password will not be required if the TCP/IP connection method is set to 'trust' within your postgres configuration file). You will also be asked for a database name if your interface is managing several databases / projects simultaneously. All your login information - encrypted - will definitely expire after 30 minutes of non activity, or when you decide to logout.

Remark: If you are working on the same machine that hosts your HTTP server, you may also replace your_domain_name by localhost:

- http://localhost/2d/

If you have activated the Apache redirection mode (by setting '$modeRewriteOn = 1' in the 2d_include.pl file), then it is time to include the generated rules into your Apache configuration file (usually at '/etc/httpd/conf/httpd.conf') and then restart the HTTP server (see with your system administrator). Those generated rules are listed in a file called 'mod_rewrite.txt' in the '/var/www/cgi-bin/2d/inc' directory (or similar). Copy them to the end of the Apache configuraion file, or simplay add this line, or similar, to the end of the configuration file:
"Include /var/www/cgi-bin/2d/inc/mod_rewrite.txt".

Finally, make sure your browser is configured to accept cookies and java scripts. Java should also be installed to run the small applet peak list viewer.

[top]

1.4 THE DATABASE CROSS-REFERENCE LINKS:

A default file listing some URL links (Web addresses) to different database cross-references (including those for the DR lines) is provided within this package (within the "text" directory). The file name is 'links.txt'. When you install a new database and allow external data retrieval, an extra file is then automatically downloaded from the ExPASy server and is added to this same directory (the file name is 'DbCrossRefs.txt'). If this file has been downloaded, then it will be used instead of the default links.txt to define your cross-reference links. You can let the tool use this file as it is, or choose to edit it yourself to update or add your own URLs.

You should not update this 'DbCrossRefs.txt' file directly from inside the "text" directory, the changes will apply to your subsequent installations but the file itself may be replaced again by a newer version extracted from the ExPASy server (if the file is older than some few days). The best way to make your changes permanent for a specific database is to update the file called 'links.txt' by editing it, after your installation is complete, from your Web server directory where it has been copied (by default the copy of this file should be found in '/www/var/cgi-bin/2d/inc/links.txt' or a similar path).

The easiest way to edit this file, is to edit it through a simple text editor. Another way - much safer - is to use the Perl module 'DbCrossRefs.pm' provided with this package (this module is also located in the "lib2d" directory, and is copied to your /www/var/cgi-bin/2d/inc' directory during the server installation process). Type 'perldoc DbCrossRefs.pm' for a brief description of the module, or simply read the DbCrossRefs.html file for more details.

Those files (links.txt, or the updated DbCrossRefs.txt) list the databases links one link per line. The format of each line is:

database_name database_URL

The database_name is the name of the database itself, while the database_URL is the URL address of the displayer of this database. e.g.

SWISS-2DPAGE http://world-2dpage.expasy.org/swiss-2dpage/{0}

The '{0}' is a pattern that the tool will replace by the given protein primary accession number related to this database.
If the database displayer requires more than one identifier, replace all the subsequent parameters by {1}, {2},.. and so on,

e.g: DATABASE_NAME http://genome-www5.stanford.edu/cgi-bin/SMD/source/sourceResult?criteria={0}&choice=Gene&option=Symbol&organism={1}

In additiom, each time you update the external data using the management (administration) interface, an additional file, called 'links.txt.uniprot' will be refreshed and placed in the server /html/2d/data/tmp directory. This downloaded file will become the primary source for your displayed cross-references URLs, as long as a variable called '$EXTERNAL::update_from_SwissProt' within your basic_include.pl configuration file is activated (not set to 0).

Note: whenever a cross-reference database is not listed in the links.txt nor in the links.txt.uniprot files, the query interface will try to extract it from a table called 'XrefDB' within the postgreSQL database itself.

[top]

1.5 PACKAGE CONTENTS:

Important: This section lists the files included within this package. Even if your are not interested in the technical aspect of the tool, it is highly recommended that you read both the data_test and data_test/examples sub-sections (with read titles), as they give some precision on the available formats given as examples for data preparation.

The main directory

This directory contains the main script to run the tool:

make2db.pl (the perl main script to run the tool)
License.txt (copyright and license terms)
version (current version of the tool)
last_STDOUT.log (the detailed output log file of the last operation launched by the main script / present only after the first time you launch the tool)
last_STDERR.log (the error output file of the last operation launched by the main script / present only after the first time you launch the tool)

The lib2d directory

This directory contains the different perl scripts and modules used by the main script, it also contains a copy of the configuration file 'basic_include.pl' under the name of 'basicIncludeInitial.pl':

make2db_CFG.pm (a module to set up the different configuration files)
make2db_ASCII_TABLES.pm (the data syntax checker and converter to generate the intermediate ASCII tables)
make2db_POSTGRES_DB.pm (builds the relational database structure and loads data into it)
make2db_util.pl (more routines and utilities for both the installation process and the Web interface external connections)
DbCrossRefs.pm (a module to manage the cross-reference databases' URLs)
MelanieXMLParser.pm (a module to parse Melanie XML exports)
miscellaneous.pl (special routines, includes the routine to generate the Apache Mode Rewrite rules and functions)
CRC64.pm (calculates the CRC64 of a string - courtesy of the Swiss-Knife developers)
convert_file (a small perl script to convert DOS files into UNIX ones)

The pgsql directory

This directory contains all the postgreSQL commands, rules, triggers and functions to build the new relational database:

make2db_tables.pgsql (postgreSQL commands to create the relational tables)
make2db_functions.pgsql (postgreSQL PL/pgSQL scripts to set up the database server-side functions)
make2db_triggers.pgsql (postgreSQL PL/pgSQL scripts to set up the database inner triggers)
make2db_update_internal.pgsql (postgreSQL PL/pgSQL scripts to update internal data relative to the external resources)
make2db_final_views.pgsql (postgreSQL PL/pgSQL scripts to build the auxiliary tables, the materialized views and the main update functions)

The text directory

This directory contains text and list files needed by the tool:

links.txt (a text file listing URL addresses to query/cross-reference different servers and databases)
tisslist_initial.txt (a prepared list of different tissues based on the UniProt tissue list - courtesy of the Swiss-Prot group, Geneva, SIB)
tisslist_aliases.txt (aliases for the tissue names listed in the tissue list)
tisslist.txt (an up-to-date list of tissues extracted from ExPASy - present only after first external connection)
DbCrossRefs.txt (an up-to-date list of URLs for cross-references to use instead of links.txt - present only after first external connection)
remoteInterfaces_initial.cfg (an initial list of related 2D-PAGE resources and their connection parameters to use for external data retrieval and search queries). Your local working version of the tool will be continously updated via the ExPASy mediator.

The src directory

This directory contains the source code for independent utilities :

<InSilicoSpectro> (this is the directory containing the Make2D-DB II modified version of the Phenyx module 'InSilicoSpectro', used to read and convert MS data)
<graphApplet> (source code of the java applet to display mass spectra)

The config directory

-m config

basic_include.pl (this is the non-generated basic configuration file that you may only edit manually)
config.cfg (your last generated personal configuration file for the database creation/conversion process)
2d_include.pl (your last generated personal configuration file for the Web interface behavior and appearance)

Your two last personal files, if they already exist, will take the suffix '.old' whenever you generate new ones (include.cfg.old and 2d_include.pl.old).

The data directory

By default, you can place inside this directory all your personal data files of the database to be converted or updated (your reports or flat file, images and the existing.maps file - the hidden*.txt files are also expected to be placed here, those are the ones to control what exactly of your data is to be kept private and reserved for privileged users).

The data_test directory

This directory contains a set of data for a test database, so you can test the tool with them. You can either define this directory in the configuration file as the one containing your data, or copy all of its contents to the default

'data'

directory.

existing.maps (a tab-delimited file listing the gel names, their dimension and related annotations; such a file is generated interactively during the configuration process)
test.dat (a flat file for a test database)
PLASMA.png (a graphic image of the map called PLASMA, to be displayed within the Web interface)
PLASMA2.png ((a graphic image of the map called PLASMA2, to be displayed within the Web interface)
small_PLASMA.png (a small graphic image of the PLASMA gel to be displayed as a thumbnail within the Web interface)
small_PLASMA2.png (a small graphic image of the PLASMA2 gel to be displayed as a thumbnail within the Web interface)
PLASMA.txt (a tab-delimited text report - excel export - listing the spots' coordinates on the PLASMA gel as well as the identified proteins, it also includes some other required, optional or free-text annotations - headers are required in this case)
PLASMA2.txt (a tab-delimited text report - excel export - listing the spots' annotations on the PLASMA2 gel, this report emphasizes the inclusion of MS files and external documents to be integrated within the database)
Export.xml (a Melanie XML export for both the PLASMA and PLASMA2 maps < generated from "Melanie IV/ImageMaster 2D Platinum" by: File -> Export -> Gel Data to XML... >, the name of the file is irrelevant as it only needs to have the '.xml' extension to be recognized by the tool)
hiddenEntries.txt (the list of the accession numbers to be hidden from public access, only privileged users would access those entries)
hiddenGels.txt (the list of the gels to be hidden from public access, only privileged users would access those gels)
hiddenSpot.txt (this list define which spots within an entry is to be hidden from public access, it also controls which experimental identification data, e.g. MS data - are not to be publicaly shown)
make2DDBLogo.gif (a graphic logo to be displayed within the Web interface)
reference.txt (a text file giving the details of 3 bibliographic references based on the Swiss-Prot annotation syntax)
subtitle.html (a template HTML page to edit that will be included within the Web interface home page "as is" and placed directly under the title section)

Here, data needs only be extracted for the two maps of interest (PLASMA and PLASMA2) from only one of the three alternatives formats present here (either the CSV tab-delimited reports, the flat file or the Melanie XML export). This is to be defined within the configuration files. You may test all of those available formats.

Two additional directories <gel_doc> and <ms_files> contain the documents declared in the existing.maps file (gel annotation documents) and in the PLASMA2.txt (different MS files in pkl, dta, mgf and mzData format, as well as a Mascot search report in dat format). Those directories are not required by the package to store this type of files: You can define the location of your documents and mass spectra files to be read from anywhere you wish.

The data_test/examples directory

As the tool works and understands different types and formats for data input, this directory contains a set of different additional examples on how to prepare data for the installation process. The format you are likely to adopt for your data depends on the availability of the data you have and on the granularity of the annotations you would like to include in your database.

PLASMA.excel_example.txt, PLASMA2.excel_example.txt, PLASMA3.excel_example.txt (CSV tab-delimited text reports - excel export - listing the spots coordinates on three different gels, there are various types of annotations in each of those files - headers are required)
PLASMA3.excel_example_noheaders.txt (also a tab-delimited text report, except that it does not contain special/personal annotations, it follows a default structure that will be described later - no headers are required)
Melanie_XML_example.xml (a Melanie XML export for both the PLASMA and PLASMA2 maps < generated from "Melanie IV/ImageMaster 2D Platinum" by: File -> Export -> Gel Data to XML... > )
test.dat (a flat file - SWISS-2DPAGE-like, of the test database)
PLASMA.report_example.txt and PLASMA2.report_example.txt (two text reports listing the spots coordinates for the PLASMA and PLASMA2 gels, they also include relative Od and volume values, those files are only to be used when a simple flat file is the source of annotations and no Melanie export is present - headers are not required)
existing.maps.detailed.example (a tab-delimited file listing the gel names, their dimension and related annotations; such a file can be generated interactively during the configuration process)
existing.maps.noSpecies.example (like the previous, except that it is manually written and contains very few annotations - this is the old format used with the previous versions of the tool and is still accepted for compatibility reasons)
existing.maps (like the previous, except this is the minimum basic format for this file - deprecated!)
example.include.cfg and example.2d_include.cfg (an example of those two interactively generated configuration files)
hiddenEntries.txt, hiddenGels.txt and hiddenSpots.txt (the files controlling which entries, gels, spots and experimental data are to be hidden from public access)
PLASMA.png, PLASMA2.png and PLASMA3.png (the graphic images of the three gels)
small_PLASMA.png, small_PLASMA2.png and small_PLASMA3.png (thumbnails for the three gels)
PLASMA.mel (a Melanie annotated image of a gel nammed 'PLASMA' - this format is deprecated, as direct extraction of annotations from Melanie images has been abandoned - not present in distributions marked as 'light')

The http_server/cgi-bin directory

2d.cgi (the main Web script to query the database)
2d_view_map.cgi (the graphic map viewer)
GDchart.pl (the GD based spectra navigator / viewer)
inc/2d_util.pl (a collection of routines and functions called by the previous two scripts)
inc/2d_util.core.pl (a collection of routines and functions called by the management / administration cloned version of the main Web script)

The http_server/divers directory

GraphApplet.jar (the compiled java applet to display mass spectra)

The http_server/icons directory

2d_menu* (the clickable images for the graphic menus)
cross* (a collection of markers used by the map viewer to show the spots)
some logo images, etc..

The lib directory

This directory contains some of the public libraries that you may need to install on your system or related information on how to get and install them when needed. The content of this directory do not follow the general license terms stated in License.txt file on the main directory.

The readme directory

This is where the readme and HTML files are located:

1.Readme_main.html (the main readme file: introduction and installation)
2.Readme_preparation.html (how to prepare your data to be converted)
3.Readme_configuration.html (description on how to create and set the configuration files)
4.Readme_interface.html (details on how to use the query interface, on how to set up the redirection rules using 'mod_rewrite', on how to extract data or objects and on how to maintain and update the database)
changes.txt (the most recent changes)
DbCrossRefs.html (manual for the DbCrossRefs.pm module)
perldoc.css (stylesheet for DbCrossRefs.html)
manch2d.html (a recent version of the SWISS-2DPAGE user manual)
Copyright_for_external_data.txt (some copyright concerning data extracted from the Web)
hiddenEntries.txt, hiddenGels.txt and hiddenSpots.txt (self explanatory files on how to make some data private)

A sub-directory called 'examples' holds some files used by the readme files.

Another sub-directory, called 'database_schema' includes 3 sets of documentation files presenting the database four relational schema: structure and functions.
This sub-directory is not present in distributions marked as 'light'.

The first set of files, prefixed by 'all', contains the whole database structure. The other 4 sets contain, each, the structure of a specific schema (common, core, log and public schema):

*.html: a set of 5 HTML files
*.dia: a set of 5 dia files (can be graphically viewed by the 'dia' program)
*.xml: a set of 5 XML files

The temp directory

This directory will be used by the tool to store temporary files.
This directory will also contain your previous databases' dumps (backups), as all already existing databases will be 'dumped' by the tool whenever a new installation or an update is replacing an existent database.
You can delete anything from this directory.

Note: depending on your use of the tool, some more additional files will be generated or downloaded into your package directories.

[top]

1.6 DATABASE SCHEMA:

Relational postgreSQL database schema in HTML summary:

[top]

1.7 TO DO LIST:

Continue to integrate and harmonize with the standardization recommendations to be issued from the PSI proposals (protocols, Gel documents, MS data and analysis..)
Extend dynamic links with external 2D data resources based on the maps species and tissues
Propose a related XML schema based on a Gel / Spot approach, and generate exports from the database in this format
Distribue the prototype of the management interface to interactively modify and add data
Integrate HPLC data (under evaluation)

1.8 CONTACT:

- For an up-to-date documentation, news and FAQ, as well as the latest version, visit the The Make-2D-DB II Site

- For more information, visit the Make2D-DB II site at world-2dpage.expasy.org/make2ddb

- For any question or comment, please feel free to send an e-mail to: World-2DPAGE

AUTHOR

Khaled Mostaguir -- Swiss Institute of Bioinformatics.

For any question, suggestion or comment: Please, contact us.

[top]

The Make2D-DB II Package

current version: 3.10.2 -- December 2011

Read-Me: Main

Related documents:

Introduction

Installation

1.1 BEFORE USING THE TOOL, YOU SHOULD:

1.2 THE DATABASE CONVERSION / INSTALLATION PROCESS AND THE WEB SERVER SETTING UP:

1.3 QUERYING THE NEWLY CONVERTED DATABASE:

1.4 THE DATABASE CROSS-REFERENCE LINKS:

1.5 PACKAGE CONTENTS:

1.6 DATABASE SCHEMA:

1.7 TO DO LIST:

1.8 CONTACT:

AUTHOR

Khaled Mostaguir -- Swiss Institute of Bioinformatics.

Related documents:

For any question, suggestion or comment: Please, contact us.

The Make2D-DB II Package

current version: 3.10.2 -- December 2011

Read-Me: Main

Related documents:

Introduction

Installation

1.1 BEFORE USING THE TOOL, YOU SHOULD:

1.2 THE DATABASE CONVERSION / INSTALLATION PROCESS AND THE WEB SERVER SETTING UP:

1.3 QUERYING THE NEWLY CONVERTED DATABASE:

1.4 THE DATABASE CROSS-REFERENCE LINKS:

1.5 PACKAGE CONTENTS:

1.6 DATABASE SCHEMA:

1.7 TO DO LIST:

1.8 CONTACT:

AUTHOR Khaled Mostaguir -- Swiss Institute of Bioinformatics.

Related documents:

For any question, suggestion or comment: Please, contact us.

AUTHOR

Khaled Mostaguir -- Swiss Institute of Bioinformatics.